FLYDATA Logo

Introduction

Welcome to FLYDATA, where we transform aviation data into actionable insights. This demonstration showcases our analytical prowess with an exploration of the nycflights23 dataset, capturing New York City flights in 2013. Our capabilities allow us to tackle questions such as:

  1. What is the average, median, and standard deviation for departure delay, arrival delay, and air time?
  2. What the number of flights for each airline?
  3. Are there seasonal trend (e.g. number of flights per month)?
  4. How high are flight departure delays throughout the day?
  5. What is the relationship between air time and distance?
  1. Which routes observe the most flights, and how can this inform operational decisions?
  2. How do weather conditions, particularly at key hubs, impact flight schedules?
  3. Are certain days more prone to operational disruptions, and how can this optimize staffing?
  4. What insights can be derived from analyzing aircraft age in relation to flight frequency?
  5. What factors influence customer satisfaction?

1. Descriptive Statistics

Our Capabilities

At FLYDATA, we specialize in transforming raw aviation data into actionable insights that empower our customers to optimize their operations and enhance customer satisfaction. By providing critical statistics such as the average, median, and standard deviation for departure delay, arrival delay, and air time, we enable our clients to gain a comprehensive understanding of their flight punctuality and operational efficiency.

Analysis

# Summarize the mean, median, and standard deviation for departure delay, arrival delay, and air time
# Use what you learned about the six verbes and display your summary as a table :) 
flights_summary <- flights |>
  reframe(
    averages = c(mean(dep_delay, na.rm = TRUE),
                 mean(arr_delay, na.rm = TRUE),
                 mean(air_time, na.rm = TRUE)),
    median = c(median(dep_delay, na.rm = TRUE),
               median(arr_delay, na.rm = TRUE),
               median(air_time, na.rm = TRUE)),
    standard_deviation = c(sd(dep_delay, na.rm = TRUE),
                           sd(arr_delay, na.rm = TRUE),
                           sd(air_time, na.rm = TRUE))
  ) |> mutate(variable = c("Departure Delay","Arrival Delay","Airtime"))

# Print the summary
#print(flights_summary)

# Print the summary (in a nicer looking way)
kable(flights_summary, "html") |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
averages median standard_deviation variable
13.837372 -2 54.31385 Departure Delay
4.344803 -10 57.86889 Arrival Delay
141.820258 121 89.17256 Airtime

2. Number of Flights Analysis

Capabilities

Knowing the total flight count for each airline provides our clients with crucial information to assess market share, identify competitive strengths, and spot potential opportunities for collaboration or strategic alliances. This data allows airlines to benchmark themselves against competitors, evaluate their fleet utilization, and optimize route planning.

Analysis

flights_by_airline <- flights |>
  group_by(carrier) |>
  summarize(number_of_flights = n()) |>
  arrange(desc(number_of_flights))

# Join with airlines for full name and print
flights_by_airline <- flights_by_airline |>
  left_join(airlines, by = "carrier")

# Print the summary
# print(flights_by_airline)

# Print the summary (in a nicer looking way)
kable(flights_by_airline, "html") |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
carrier number_of_flights name
YX 88785 Republic Airline
UA 79641 United Air Lines Inc. 
B6 66169 JetBlue Airways
DL 61562 Delta Air Lines Inc. 
9E 54141 Endeavor Air Inc. 
AA 40525 American Airlines Inc. 
NK 15189 Spirit Air Lines
WN 12385 Southwest Airlines Co. 
AS 7843 Alaska Airlines Inc. 
OO 6432 SkyWest Airlines Inc. 
F9 1286 Frontier Airlines Inc. 
G4 671 Allegiant Air
HA 366 Hawaiian Airlines Inc. 
MQ 357 Envoy Air

4. Departure and Arrival Delays throughout the Day

Our Capabilities

By providing insights into these patterns, we help airlines and airports optimize their schedules, enhance ground operations, and minimize delays. This data-driven approach not only enhances operational effectiveness but also significantly improves passenger experience by reducing wait times and maintaining reliable schedules.

Analysis

delay_by_hour <- flights |>
  group_by(hour) |>
  summarize(avg_dep_delay = mean(dep_delay, na.rm = TRUE),
            avg_arr_delay = mean(arr_delay, na.rm = TRUE))

# Plotting average departure delay per hour
delay_by_hour |>
  ggplot() +
  geom_line(aes(x = hour, y = avg_dep_delay, color = "Departure"), linewidth = 1) +
  geom_line(aes(x = hour, y = avg_arr_delay, color = "Arrival"), linewidth = 1) +
  labs(
    title = "Average Delay by Hour",
    x = "Hour of Day", 
    y = "Average Delay (minutes)",
    color = "Type"
  ) +
  theme_minimal() -> delay_by_hour_plot

ggplotly(delay_by_hour_plot)

5. The Relationship between Airtime and Distance?

Our Capabilities

By offering a detailed analysis of the air time and distance relationship, we enable airlines to optimize route planning, improve flight scheduling, and enhance overall operational performance, ensuring a balance between efficiency, cost, and customer satisfaction.

Analysis

air_time_distance_plot <- flights |>
  ggplot(aes(x = distance, y = air_time)) +
  geom_point(alpha = 0.3, color = "blue") +
  geom_smooth(method = "lm", col = "red") +
  labs(title = "Air Time vs. Distance", x = "Distance (miles)", y = "Air Time (minutes)") +
  theme_light()

air_time_distance_plot
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 12534 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 12534 rows containing missing values or values outside the scale range
## (`geom_point()`).

6. Busiest Routes

Question

FLYDATA begins by identifying the busiest routes. Insights into high-traffic corridors can drive strategic planning for airlines and airports.

Analysis

busiest_routes <- flights |>
  group_by(origin, dest) |>
  summarize(number_of_flights = n()) |>
  arrange(desc(number_of_flights)) |>
  ungroup()
## `summarise()` has grouped output by 'origin'. You can override using the
## `.groups` argument.
knitr::kable(head(busiest_routes, 10), caption = "Top 10 Busiest Routes from NYC")
Top 10 Busiest Routes from NYC
origin dest number_of_flights
JFK LAX 10045
LGA ORD 9923
LGA BOS 8217
LGA ATL 7883
JFK SFO 7440
EWR MCO 7262
JFK BOS 6432
LGA DFW 5972
JFK MIA 5930
EWR ATL 5915

Visualization

busiest_routes |>
  top_n(10, number_of_flights) |>
  ggplot(aes(x = reorder(paste(origin, dest, sep = " - "), number_of_flights), y = number_of_flights)) +
  geom_bar(stat = "identity", fill = "lightgreen") +
  coord_flip() +
  labs(title = "Top 10 Busiest Routes from NYC", x = "Route", y = "Number of Flights") +
  theme_minimal() -> busiest_routes_plot

ggplotly(busiest_routes_plot)

Findings

Our exploration identifies key routes to major airports such as LAX, ORD, and ATL. This information aids in optimizing fleet allocation and scheduling.

7. Weather Impact on Delays

Question

Analyzing the influence of weather, especially at JFK, FLYDATA assesses its impact on delays to bolster operational resilience.

Analysis

jfk_weather_delay <- flights |>
  filter(origin == "JFK") |>
  left_join(weather, by = c("origin", "year", "month", "day", "hour")) |>
  group_by(date = as.Date(paste(year, month, day, sep = "-"))) |>
  summarize(avg_dep_delay = mean(dep_delay, na.rm = TRUE),
            avg_wind_speed = mean(wind_speed, na.rm = TRUE))

knitr::kable(head(jfk_weather_delay), caption = "Sample Data of JFK Wind Speed Impact on Delays")
Sample Data of JFK Wind Speed Impact on Delays
date avg_dep_delay avg_wind_speed
2023-01-01 18.764045 10.874225
2023-01-02 45.703833 7.068520
2023-01-03 38.898649 4.901329
2023-01-04 32.215488 5.684622
2023-01-05 11.787879 6.163238
2023-01-06 7.862876 5.199678

Visualization

jfk_weather_delay |>
  ggplot(aes(x = avg_wind_speed, y = avg_dep_delay)) +
  geom_point(alpha = 0.5, color = "orange") +
  geom_smooth(method = "lm", col = "darkgreen") +
  labs(title = "Impact of Wind Speed on Departure Delay at JFK", x = "Average Wind Speed (miles/hour)", y = "Average Departure Delay (minutes)") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Findings

Identifying a positive correlation, we recommend leveraging weather data to anticipate delays and refine scheduling protocols.

8. Delays by Day of the Week

Question

FLYDATA investigates whether certain days endure more delays, thus suggesting staffing and scheduling efficiency opportunities.

Analysis

flights_dayofweek <- flights |>
  mutate(weekday = weekdays(as.Date(paste(year, month, day, sep = "-")))) |>
  group_by(weekday) |>
  summarize(avg_dep_delay = mean(dep_delay, na.rm = TRUE)) |>
  arrange(match(weekday, c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")))

knitr::kable(flights_dayofweek, caption = "Average Departure Delay by Day of the Week")
Average Departure Delay by Day of the Week
weekday avg_dep_delay
Monday 14.73471
Tuesday 10.85576
Wednesday 10.60587
Thursday 11.90029
Friday 16.58209
Saturday 15.84804
Sunday 16.99358

Visualization

flights_dayofweek |>
  ggplot(aes(x = weekday, y = avg_dep_delay)) +
  geom_bar(stat = "identity", fill = "dodgerblue") +
  labs(title = "Average Departure Delay by Day of the Week", x = "Day of Week", y = "Average Departure Delay (minutes)") +
  theme_minimal()

Findings

Delays are prevalent on weekends, suggesting a strategic focus on weekends for staffing and resource allocation.

9. Aircraft Age Analysis

Question

Our analysis examines the relationship between aircraft age and usage, providing insights into fleet management strategies.

Analysis

plane_ages <- flights |>
  left_join(planes, by = "tailnum") |>
  mutate(plane_age = 2023 - year.y) |>
  group_by(plane_age) |>
  summarize(number_of_flights = n())

knitr::kable(filter(plane_ages, !is.na(plane_age)), caption = "Flights by Aircraft Age in 2013")
Flights by Aircraft Age in 2013
plane_age number_of_flights
0 9004
1 15393
2 10007
3 9323
4 14729
5 11362
6 20087
7 13653
8 19259
9 32598
10 24685
11 7089
12 5109
13 7061
14 11313
15 44442
16 30558
17 16239
18 15511
19 9622
20 5580
21 9691
22 15163
23 16022
24 15825
25 11340
26 3733
27 1801
28 1704
29 3494
30 993
31 1455
32 1243
33 1204

Visualization

plane_ages |>
  filter(!is.na(plane_age)) |>
  ggplot(aes(x = plane_age, y = number_of_flights)) +
  geom_col(fill = "coral") +
  labs(title = "Flights by Aircraft Age in 2013", x = "Aircraft Age (years)", y = "Number of Flights") +
  theme_minimal()

Findings

Our findings reveal diverse aircraft usage across various ages, informing maintenance and fleet strategies to enhance safety and cost-efficiency.

10. Influences on Customer Satisfaction

Question

Customer satisfaction is a multidimensional aspect that’s pivotal for the sustained success and reputation of airlines and airports. At FLYDATA, we analyze various factors influencing customer satisfaction to provide our clients with insights for enhancing passenger experiences and ensuring customer loyalty. Here’s a look at some key insights into customer satisfaction:

Analysis

Visualisation

Conclusion

Wind-Up!

This demonstration from FLYDATA illustrates how our data-driven insights can shape operational excellence in the aviation industry. From understanding route dynamics to optimizing aircraft usage, FLYDATA empowers stakeholders to make informed decisions. For bespoke analysis and deeper dives into your specific needs, connect with our team to explore customized solutions.